An Application of Neural Networks to Sequence Analysis and Genre Identification

نویسنده

  • David Bisant
چکیده

This study borrowed sequence analysis techniques from the genetic sciences and applied them to a similar problem in email filtering and web searching. Genre identification is the process of determining the type or family of a given document. For example, is the document a letter, a news story, a horoscope, a joke, or an advertisement. Genre identification allows a computer user to further filter email and web sites in a way that is totally different than topic-based methods. This study presents original research in an application of neural networks to the genre identification problem. The data for the study came from a database constructed by the author and his colleagues. The data consisted of descriptive features and the genre classification, as judged by a human, from over 5000 different documents. Ten different genres were represented. The descriptive features consisted of 89 different measurements of each document such as average word length, the number of numeric terms, the proportion of present tense verbs, etc. The data was divided into 2 sets, with 75% for training and 25% for testing. The first neural network applied was a very basic single layer network that achieved 79% correct classifications on the testing data. This performance was equivalent to the previous best method on the problem, decision trees. When more complex neural networks were applied to the problem, performance increased significantly. The best performance of 86% correct classifications was achieved by a network with a single hidden layer of 300 units. Increasing the number of hidden layers, or changing the number of hidden units did not improve performance. A weight decay process also did not improve performance. The analysis of the features indicated that 2nd order information was being exploited by the networks for better performance. This means that neural networks will outperform statistical models or other methods that only utilize 1st order information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison Study on Neural Networks in Damage Detection of Steel Truss Bridge

This paper presents the application of three main Artificial Neural Networks (ANNs) in damage detection of steel bridges. This method has the ability to indicate damage in structural elements due to a localized change of stiffness called damage zone. The changes in structural response is used to identify the states of structural damage. To circumvent the difficulty arising from the non-linear n...

متن کامل

Neural Network Sensitivity to Inputs and Weights and its Application to Functional Identification of Robotics Manipulators

Neural networks are applied to the system identification problems using adaptive algorithms for either parameter or functional estimation of dynamic systems. In this paper the neural networks' sensitivity to input values and connections' weights, is studied. The Reduction-Sigmoid-Amplification (RSA) neurons are introduced and four different models of neural network architecture are proposed and...

متن کامل

Flood Forecasting Using Artificial Neural Networks: an Application of Multi-Model Data Fusion technique

Floods are among the natural disasters that cause human hardship and economic loss. Establishing a viable flood forecasting and warning system for communities at risk can mitigate these adverse effects. However, establishing an accurate flood forecasting system is still challenging due to the lack of knowledge about the effective variables in forecasting. The present study has indicated that th...

متن کامل

AN INTELLIGENT FAULT DIAGNOSIS APPROACH FOR GEARS AND BEARINGS BASED ON WAVELET TRANSFORM AS A PREPROCESSOR AND ARTIFICIAL NEURAL NETWORKS

In this paper, a fault diagnosis system based on discrete wavelet transform (DWT) and artificial neural networks (ANNs) is designed to diagnose different types of fault in gears and bearings. DWT is an advanced signal-processing technique for fault detection and identification. Five features of wavelet transform RMS, crest factor, kurtosis, standard deviation and skewness of discrete wavelet co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJPRAI

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2004